Statistical Sensitive Data Protection And Inference Prevention with Decision Tree Methods
نویسنده
چکیده
We present a new approach for protecting sensitive data in a relational table (columns: attributes; rows: records). If sensitive data can be inferred by unauthorized users with non-sensitive data, we have the inference problem. We consider inference as correct classi cation and approach it with decision tree methods. As in our previous work, sensitive data are viewed as classes of those test data and non-sensitive data are the rest attribute values. In general, however, sensitive data may not be associated with one attribute (i.e., the class), but are distributed among many attributes. We present a generalized decision tree method for distributed sensitive data. This method takes in turn each attribute as the class and analyze the corresponding classi cation error. Attribute values that maximize an integrated error measure are selected for modi cation. Our analysis shows that modi ed attribute values can be restored and hence, sensitive data are not securely protected. This result implies that modi ed values must themselves be subjected to protection. We present methods for this rami ed protection problem and also discuss other sta-
منابع مشابه
A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملCredit Card Fraud Detection using Data mining and Statistical Methods
Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...
متن کاملبررسی کارایی مدل درختان تصمیمگیری در برآورد رسوبات معلق رودخانهای (مطالعه موردی: حوضه سد ایلام)
The real estimation of the volume of sediments carried by rivers in water projects is very important. In fact, achieving the most important ways to calculate sediment discharge has been considered as the objective of the most research projects. Among these methods, the machine learning methods such as decision trees model (that are based on the principles of learning) can be presented. Decision...
متن کاملMulti-Dimensional Inference and Con dential Data Protection with Decision Tree Methods
We present a novel approach to the challenging issue of database con dential data protection. We adopt the decision tree framework as our baseline and extend it to cope with databases where the class label attribute is not speci ed. We are interested in con dential data that are randomly distributed over di erent attributes (referred to as multi-dimensional inference). For condential data prote...
متن کاملبررسی کارایی روشهای مختلف هوش مصنوعی و روش آماری در برآورد میزان رواناب (مطالعه موردی: حوزه شهید نوری کاخک گناباد)
Rainfall-runoff models are used in the field of hydrology and runoff estimation for many years, but despite existing numerous models, the regular release of new models shows that there is still not a model that can provide sophisticated estimations with high accuracy and performance. In order to achieve the best results, modeling and identification of factors affecting the output of the model i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003